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to different datasets. In this paper, the algorithms investigated for mining the 


frequent patterns are; Pre-post, Pre-post+, FIN, H-mine, R-Elim, and estDec+ 
Keywords: algorithms. These algorithms have been implemented and tested on four real- 
D see life datasets that are: The retail dataset, the Accidents dataset, the Chess 

ata mining , dataset, and the Mushrooms dataset. From the results, it has been observed 
estDec+ algorithm . that, for the Retail dataset, estDec+ algorithm is the fastest among all 
Frequent pattern mining algorithms in terms of run time as well as consumes less memory for its 
Pre-post+ algorithm execution. Pre-post+ algorithm performs better than all other algorithms in 
terms of run time and maximum memory for the Mushrooms dataset. Pre-Post 
outperforms other algorithms in terms of performance. And for Accident 
datasets, in terms of execution time and memory consumption, the FIN 
method outperforms other algorithms. 
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1. INTRODUCTION 

The amount of data generated and collected from numerous sources has skyrocketed in recent years. 
Data mining is an interdisciplinary academic topic that has emerged to analyze massive amounts of data [1]. Many 
data mining applications rely heavily on frequent patterns. Thus, in order to obtain higher algorithmic efficiency 
and a better comprehension of the mined results, it is required to create algorithms for mining frequent patterns 
and learning the properties of the targeted data. How to find viable applications for the algorithm, is also required 
to examine the performance of various data mining methods on diverse datasets [2]. The focus of this literature 
survey is on the analysis of different frequent pattern mining algorithms on various real data sets. Many strategies 
for detecting common patterns have been proposed. Authors have taken different real-life data sets as well as to 
evaluate the performance of the algorithms against running time as well as memory consumption. 

The N-list data representation, which is derived from the pre-order post-order code (PPC) tree, is an 
frequent patterns (FP) tree-like coding prefix tree that preserves critical information about frequent item-sets 
[3]. Algorithm performance is measured against the amount of time it takes to run and the amount of memory 
it consumes. Pre-post is the fastest in the majority of cases, according to the results. Pre-post+, a high- 
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performance approach for mining frequent item-sets. It also includes a time-saving pruning approach known 
as children-parent equivalence pruning, which significantly reduces the search space [4]. On a variety of real 
datasets, rigorous tests were conducted to compare Pre-post+ against three state-of-the-art algorithms: Pre- 
post, FIN, and FP growth. Pre-ost+ is the fastest in the majority of cases, according to the results. Deng and Lv 
[5] proposed a node-set, a data structure for frequent mining item-sets that is more efficient. The paper 
compares FIN growth to Pre-post and FP growth on several real and synthetic datasets. The results show that 
FIN performs well in terms of both operating time and memory use. Developed a simple as well as novel data 
structure with the help of hyperlinks, H-struct, as well as mining algorithms. In this paper author do a study of 
H-mine against FP growth and Apriori algorithms and their performance is compared against running time and 
memory consumption [6]. The results reveal that H-mine performs admirably with diverse types of data. 
Recursive elimination for finding frequent item-sets. Relim is inspired by FP growth and the H-mine algorithm 
[7]. It does not need prefix trees or any other sophisticated data structures to process transactions [8]. Its main 
strength is its simplicity of its structure nit its speed. Paper presents an evaluation of recursive elimination over 
FP growth and éclat and Apriori on various datasets. Recursive elimination, which is based on deleting items, 
recursive processing, and reassigning transactions, performs well in tests. It's a quick and straightforward 
process to put in place. 

Kaushal and Singh [9] using web click stream datasets, conducts a groundbreaking comparison study 
of five of the most essential sequential pattern mining techniques. Algorithm performance is measured in terms 
of execution time and memory consumption. Baralis et al. [10] has presented that it is economical and linearly 
scalable for extensive databases for both sparse and dense data distributions. Moreover, it outperforms FP 
growth in terms of performance. Deng and Wang [11] present a revolutionary vertical algorithm termed PPV 
for quick, frequent pattern discovery. This paper compares PPV with FP growth. The results of the experiments 
reveal that PPV is a good algorithm that outperforms FP growth, éclat, as well as dEclat. Deng et al. [12] described 
a new data representation called NC-set, which keeps track of the complete information used for erasable mining 
item-sets. Based on this NC set, a new algorithm has been proposed called MERIT for mining erasable item-sets 
efficiently. Deng [13] presented a new algorithm neural tangent kernel (NTK) to mine top-rank-k frequent patterns 
since mining Top-Rank-k regular patterns is an emerging topic in frequent pattern mining. 

a) Data mining 

While the data collected from diverse sources are often too large to be analyzed manually, computer 
technology has substantially increased in terms of processing power and storage capacity [14]. As a result, using 
computer technology to find information as well as knowledge in ever-increasing amounts of data has become 
feasible, inexpensive, and crucial. These factors necessitate the development of innovative approaches to convert 
large quantities of target data into meaningful information as well as knowledge in a sufficient length of time. 

As a result, a new research topic is known as data mining, or database knowledge discovery has 
emerged. Data mining is the process of extracting nontrivial, previously unknown, as well as possibly useful 
information from enormous amounts of data [15]. It reflects the merging of numerous sciences, including 
machine learning, information theory, as well as database systems, as an interdisciplinary research subject. 
Association rule mining, classification, clustering, regression, as well as outlier detection are some of the most 
typical data mining activities [16]. 

b) Frequent pattern mining 

Such patterns are pruned by frequent pattern mining tools, which consider them to be unwanted or of 
little interest. Because of its usefulness in so many domains, data mining has gotten a lot of interest in the 
database research community [17]. Fused deposition modeling (FDM) is a widely used data mining technique 
that is useful for various applications, including association mining, correlations, sequential item-sets, max- 
item-sets, partial periodicity, and emergent item-sets. FP are item-sets, subsequences, or substructures that 
emerge in the target dataset with a frequency more significant than a (user-defined) threshold value [18]. 

Naik [19] described mining for market basket analysis to solve the problem of mining association rules. 
The fundamental purpose of discovering association rules is to predict consumer behaviour by identifying inherent 
relationships between the many things that customers have purchased from retailers or supermarkets. Numerous 
works on the rapid mining of recurrent patterns can be divided into two groups: The first category is Apriori has 
presented by [20], and the second category is FP-growth and tree-projection has presented by [21]. In some situations, 
these tactics are still problematic has presented by [22]. Frequent pattern mining has been effectively implemented 
for enhanced decision support in a wide range of real-world applications, including shown in Figure 1. 

With the globalization of trade, businesses encounter an increasing number of clients and transactions. 
As a result, they must be aware of both risks and possibilities. Mining common patterns can aid in the creation 
of promotions, discounts, retail layouts, exceptional marketing, storage management, and market forecasting 
has presented by [23]. In disaster prevention, analyzing numerous environmental parameters such as 
temperature, humidity, as well as wind, especially for imminent wind, can help forecast the weather as well as 
avoid loss as well as casualties has presented by [24], [25]. 
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Figure 1. Classification of frequent pattern mining algorithms 


2. ABOUT THE DATASET USED FOR EXPERIMENTATION 

In this paper, we have used 4 public real-life datasets-Accidents, Chess, Mushrooms and Retail- to 
evaluate the mining algorithms, which includes the Pre-post algorithm, Pre-post+ algorithm, FIN algorithm, 
H-mine algorithm, Recursive elimination nestDec+ algorithm. Datasets has downloaded from the UCI 
repository and FIMI repository. The retail dataset comprises data from an anonymous retail store's market 
basket. It has 88,162 transactions incidents dataset that offers a wealth of information about different types of 
accidents and their causes. It contains 340,183 numbers of transactions. The mushrooms dataset offers 
information about several types of mushrooms. It includes 8,416 numbers of transactions. The chess dataset 
contains the different gaming steps having a probability of winning and losing. It contains 3,196 numbers of 
transactions. These are the real-life datasets taken to check the behaviour of the algorithms that which of these 
algorithms take less execution time and consume less memory for mining the most common patterns. 


3. METHOD AND PROPOSED WORK 

In this paper, we have conducted a comparative analysis of various recent frequent patterns mining 
algorithms by employing different datasets and then data mining is done using different pattern mining 
algorithms. The multiple datasets used in our study are Chess, Accidents, Mushrooms and Retail. The 
evaluation is done on the basis of run time as well as memory consumption and finally, the analysis has been 
done, and the results have been taken to conclude which among these selected algorithms performs better on 
what kind of dataset. Figure 2 presents the process of experimentation. 
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Figure 2. Process of experimentation 
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The majority of previously suggested algorithms for frequent mining item-sets can be divided into 
two groups: Apriori and FP growth. Despite the fact that several algorithms have been devised, one of the many 
significant research topics that have yet to be solved is how to build effective mining algorithms. Because the 
proposed approaches challenge the main memory requirement and efficiency of the occasions like dense vs. 
sparse, Massive vs. memory-based data sets. 


4. RESULTS AND DISCUSSIONS 

The result shows the performance of every frequent pattern mining algorithm; the algorithm must deal 
with various real as well as synthetic data sets. In this paper, different experiments on real data sets are carried 
out to verify the algorithm's performance. Four real data sets have been utilised to evaluate the performance of 
many popular pattern mining methods: Pre-post algorithm, Pre-post+ algorithm, FIN algorithm, H-mine 
algorithm, Recursive elimination, and estDec+ algorithm. In Table | presents dataset parameters and its 
characteristics. 


Table 1. Dataset parameters and characteristics 


Dataset Number of transactions Distinct items Size of a typical transaction Real-world dataset 

Retail 88162 16470 10.3 UCI repository 
Accidents 340183 468 33.8 FIMI repository 
Mushrooms 8416 119 23 UCI Repository 

Chess 3196 715 37 UCI Repository 


4.1. Running time or time complexity 

The running time comparison of the algorithm on different data sets is shown in Figures 3-6. Figure 3 
shows the running time of the compared algorithms on retail. Under all minimum supports, Pre-post 
outperforms the other six algorithms. Extensive minimum supports are required, estDec+ performs faster than 
Pre-post+ and different algorithms. However, estDec+ is faster than all algorithms, even when the minimum 
support is no more than 50%. At the support of 0.05, Pre-post runs fastest than all other algorithms. Figure 3 
illustrates running time comparison for retail dataset. 

Figure 5 displays the time it takes for the comparative algorithms to complete an accident. Pre-post is 
the most efficient algorithm, and it runs faster than Pre-post and FIN. The FIN algorithm fails to discover all 
frequent item-sets and runs out of memory when the minimum support is around 0.1 to 0.2. When the support is 
0.1, Pre-post+ is still highly inefficient, taking over 10000s to complete. Pre-post+ outperforms both Pre-post and 
FIN algorithms with support of 0.3. FIN fails to find all frequent item-sets in a reasonable amount of time. 
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Figure 4. Running time comparison for accidents dataset 
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Figure 5 on Chess shows the running time of the comparative algorithms. At the support of 100%, 
Pre-post runs better overall the algorithms. At the support of 0.95, Pre-post runs the fastest and takes time 264s. 
At the support of 0.25 and onwards up to 0.35, Pre-post, FIN and H-Mine run out of memory and gives no 
result. At the minimum support of not more than 60%, Pre-post normally performs than others. H-mines take 
a lot of time to discover frequent item-sets. 

Figure 6 on Mushrooms displays the running duration of the comparison algorithms. At the support 
of 100%, Pre-post and Pre-post+ perform much better than h-mine, R-Elim and FIN algorithms. At the support 
of 0.05 to 0.25, R-Elim runs out of memory and fails to adopt all frequent item-sets. At the support of 0.25, 
Relim takes a lot of time, like 392315s, which is a much longer time for finding the frequent item-sets. Pre- 
Post performs better at 50% of support than others but takes 957s for finding the frequent item-sets. 
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Figure 5. Running time comparison for chess dataset 
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Figure 6. Running time comparison for mushrooms dataset 


4.2. Maximum memory comparison result 

Figure 7 on retail, which is a dense data set, reveals the memory cost of the contrasting techniques. The 
maximum memory consumption of the algorithm is compared on different data sets, as shown in Figures 8-11, 
respectively. EstDec+ consumes four to five times as much memory as H-mine and outperforms all other 
algorithms H-mine uses around 1.2 times the amount of memory that R-Elim does on average. Pre-post, as well 
as Pre-post+, consumes a large amount of memory. Nevertheless, the performance of Pre-post and Pre-post+ is 
almost the same. R-Elim performs even better than these two algorithms but consumes more memory than H-mine. 

Figure 8 shows for the Chess dataset illustrates the memory use of the compared methods. H-mine 
consumes less memory than other methods when the support is 100%. As the level of assistance drops, Pre- 
post+ becomes worst, having a memory consumption of 780.4503 mb. At the support count from 0.05 to 0.3, 
Pre-post, FIN and H-mine give an out of memory error. As the minimum support increased from 0.05 to 1, the 
maximum memory value decreased good giving performance. 
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Figure 7. Memory usage comparison for retail dataset 
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Figure 8. Memory usage comparison for chess dataset 


Figure 9 displays a comparison of the memory usage of the compared algorithms for accident datasets. 
At the support of 100%, Pre-post+ performs much better than Pre-post, while FIN runs out of memory and fails 
to adopt the frequent patterns. But as we increase the support from 0.1 to so on, the maximum memory usage 
decreases, which is not good. When the support is not more than 60%, the Pre-post performs better than the 
other two algorithms. Figure 10 which is using the Mushrooms datasets illustrates the memory cost of the 
different techniques. The memory consumed by H-mine is two to three times faster than that of Pre-post, Pre- 
post+ and FIN algorithms and consumes about 5.3519 MB at the minimum support of 100%. 
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Figure 9. Memory usage comparison for accidents dataset 
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Figure 10. Memory usage comparison for mushrooms dataset 


The Recursive elimination consumes slightly the same memory as that of the H-mine, which is about 
5.88574 MB. As the support decreased from 100% to 10%, for example, the memory usage of all algorithms 
increased. H-mine, on the other hand, uses less memory than other algorithms. Recursive elimination runs out 
of memory at the support ranges from 0.05 to 0.2. 

Considering the results from Figures 3-10, it can be concluded that out of the six algorithms, five 
algorithms have provided better results on two data sets which are retail and mushrooms. For accidents, Pre- 
post consumes the least time and Pre-post+ consumes the least amount of memory at 100% support. Likewise, 
Pre-post consumes the least time for chess and H-mine consumes less memory. In the same way, for 
mushrooms, Pre-post and Pre-post+ take the same time to produce frequent patterns. In Table 2 shows results 
of the compared algorithm over real-life dataset. In Table 3 shows comparision result of different algorithms 
over real-life datasets. 
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Table 2. Results of the compared algorithms over real-life datasets 


Algorithms Datasets Run Time (ms) Maximum Memory (MB) 
Pre-post Mushrooms 1510.85 53.74 
Chess 4547.5 89.23 
Accidents 15341.68 171.30 
Retail 979.05 38.63 
Pre-post+ Mushrooms 1442.6 52.93 
Chess 17077.9 140.45 
Accidents 15380.63 171.77 
Retail 999.35 38.33 
FIN Mushrooms 1608.1 55.70 
Chess 8675.15 52.18 
Accidents 27309.10 211.11 
Retail -- -- 
H-mine Mushrooms 24838.75 88.89 
Chess 67977.5 133.69 
Accidents -- -- 
Retail 1162.1 11.96 


Table 3. Results of the compared algorithms over real-life datasets 
Algorithm Datasets Run Time (ms) | Maximum memory (mb) 
R-Elim Mushrooms 36434.06 92.0545 
Chess -- -- 
Accidents -- -- 
Retail 1191.75 14.68 
estDec+ Mushrooms -- -- 
Chess -- -- 
Accidents -- -- 
Retail 874.25 1.75 


The Tables 2-3 compared all the algorithms in terms of memory consumption and run time. In terms 
of execution time, the estDec+ algorithm is the fastest among all algorithms for each minimal support for the 
retail dataset. Pre-post+ algorithm is the fastest among all algorithms for the mushrooms dataset, whereas for 
the chess dataset, Pre-post performs well. Finally, for accidents datasets, FIN performs much better than Pre- 
post and Pre-post+. Also, in terms of the maximum memory, the estDec+ method is the quickest of all 
algorithms for each minimal support for the retail dataset. Pre-post+ algorithm is the fastest among all 
algorithms for mushrooms datasets whereas, for chess, FIN is better than all other algorithms. Finally, for 
accidents datasets, Pre-post and Pre-post+ perform the same as they have a difference of 0.30 MB. 


5. CONCLUSION AND FUTURE WORK 

Pre-post, Pre-post+, FIN, H-mine, and Recursive- Elimination, estDec+, are among the most useful 
frequent pattern mining algorithms in this study. The evaluation of these algorithms has been done on four real- 
life datasets. The algorithms' running time and memory consumption are compared, and the results are 
provided. It has been observed that for the Accidents dataset, Pre-post and Pre-post+ consume the least time. 
It has a novel N-list data structure that comes from an FP-tree-like coding prefix tree called PPC-tree that keeps 
the critical information about frequent item-sets and uses the least memory. Because transactions with the same 
prefixes share the same nodes of a PPC-tree, the N-list is compact. The difference between the two algorithms 
is the smallest. Pre-post out performs because the counting of item-sets is changed into the intersection of N- 
lists, reducing the complexity of the intersecting N-lists to O(m+n) by an efficient method. The dataset utilized 
determines the storage cost for maintaining the N-list of item-sets. Because the dataset employed here is dense, 
the storage cost is low. 

Extending these algorithms to produce efficient ways for mining common item-sets is an exciting 
future direction for our research. Since the amount of data available is increasing at an exponential rate, using 
these algorithms to extract common item-sets from big data is also an intriguing task. We plan to use these 
techniques to find the most common item sets in terms of future extensions of this work. We'll aim to include 
all of the algorithms’ ideas into the process of extracting patterns from large amounts of data. 
Parallel/distributed implementations of these algorithms are also an exciting task as the available data is 
growing exponentially. Moreover, the work can be done on different datasets for the sake of running time and 
memory consumption. It is necessary to offer a novel data structure for extracting all frequent patterns from 
transactional databases with a single database scan and without having to rescan the original database. 
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